Nasal DNA methylation at three CpG sites predicts childhood allergic disease

Childhood allergic diseases, including asthma, rhinitis and eczema, are prevalent conditions that share strong genetic and environmental components. Diagnosis relies on clinical history and measurements of allergen-specific IgE. We hypothesize that a multi-omics model could accurately diagnose childhood allergic disease. We show that nasal DNA methylation has the strongest predictive power to diagnose childhood allergy, surpassing blood DNA methylation, genetic risk scores, and environmental factors. DNA methylation at only three nasal CpG sites classifies allergic disease in Dutch children aged 16 years well, with an area under the curve (AUC) of 0.86. This is replicated in Puerto Rican children aged 9–20 years (AUC 0.82). DNA methylation at these CpGs additionally detects allergic multimorbidity and symptomatic IgE sensitization. Using nasal single-cell RNA-sequencing data, these three CpGs associate with influx of T cells and macrophages that contribute to allergic inflammation. Our study suggests the potential of methylation-based allergy diagnosis.


Supplementary Figure 2. Precision-recall curves (PRC) for the 3-CpG sites model
Based on the 30 CpG sites model. These sites are used as model variables for an Elastic Net, which is estimated in the standard 10-times repeated 10-fold cross-validation framework. Their model has been tuned using the same parameters as for the discovery cohort. (a) ROC curve, (b) PRC curve.

Supplementary Figure 4. Age-stratified replication of 3 CpG model in EVA-PR cohort
Based on the 3-CpG model that was used for the overall replication, the same model has been used to predict allergic disease within different age groups in the EVA-PR cohort (9-14, 15-17, 18-20) to assess age-dependent performance via ROC (a) and PRC (b).

Supplementary Figure 9. Venn diagram of comorbidity
Diagrams show the PIAMA samples with both allergy symptom and IgE sensitization (the allergy phenotype used in this study) (a), and samples with allergy symptom but without IgE sensitization (b).

Study and population description of discovery cohort
The discovery analysis was performed in the PIAMA birth cohort (Prevention and Incidence In this study, a case with allergic disease is a child with at least one of three allergic diseases (asthma/rhinitis/eczema) and who is specifically IgE positive (>0.35 lU/mL) to any allergen (house dust mite, cat, dactylis (grass) or birch). Asthma was defined as the presence of at least 2 of the following 3 criteria: 1) Doctor diagnosed asthma ever; 2) Wheeze in the last 12 months; and 3) Prescription of asthma medication in the last 12 months. Rhinitis was defined as the presence of sneezing or runny/blocked nose without having a cold in the last 12 months and nose symptoms accompanied by itchy, watering eyes. Eczema was defined as a positive answer to the question: has your child ever had an itchy rash which was coming and going in the last 12 months? If Yes: has this itchy rash affected any of the following places: the folds of the elbows, behind the knees, in front of the ankles, or around the neck,  Table 8). In the sensitivity analysis of polygenic risk score (PRS), given the possibly limited prediction performance of genetics because of the small sample size, we tried to perform sensitivity analysis using a larger dataset from PIAMA that had both genotype and phenotype data at 16 years old (N = 675), regardless of the availability of other data layers. In the eQTM analysis, we included 244 participants that had both nasal DNA methylation and nasal RNA-seq data available. In the MeQTL analysis, we included 422 participants with both genotype and nasal DNA methylation data.

Genotype data
Genome-wide genotyping was performed in four phases. Quality control (QC) for each phase was performed and then the data were merged together.

DNA methylation data
We collected samples of peripheral blood, and of nasal epithelial cells by brushing. Briefly, the right nostril of the subjects was examined and the inferior turbinate was located using a speculum and penlight. Brushing was performed with a Cytosoft brush CP-5B (Cyto-Pak) after local anesthesia with 1% lidocaine spray. The lateral area underneath the inferior turbinate was then brushed for 3 seconds and the brush was placed in a 2 ml screw-cap Eppendorf tube and put into a freezer at -80°C until further processing. In total 4 brushes (2 for DNA isolation and 2 for RNA isolation) were collected.
DNA from whole blood was extracted using QIAamp blood kit (Qiagen Benelux BV, Venlo, the Netherlands) and nasal epithelium samples was extracted using DNA investigator kit (Qiagen Benelux BV). DNA concentration was determined by Nanodrop measurement and Picogreen quantification. 500 ng of DNA was bisulfite-converted using the EZ 96-DNA methylation kit (Zymo Research, Irvine, CA, USA), following the manufacturer's standard protocol. After verifying the bisulfite conversion step using Sanger sequencing, DNA concentration was normalized and the samples were randomized to avoid batch effects.
One standard DNA sample per chip was included in this step for QC purposes.
Blood and nasal DNA samples were hybridized to the Infinium HumanMethylation450 BeadChip array (Illumina, San Diego, CA, USA). DNA methylation data were pre-processed in R with the Bioconductor package Minfi 4 , using the original IDAT files extracted from the HiScanSQ scanner. We implemented sample filtering to remove poor quality samples (call rate <99%). Furthermore, we used the 65 SNP probes to check for concordances between paired DNA brush and blood samples from the same individuals. Paired samples that showed a SNP signal with a Pearson correlation coefficient <0.9 were regarded as sample mix-ups and were excluded from the study. We also verified the methylation distribution of the X-chromosome to verify gender. During processing, the probes on sex chromosomes, the probes that mapped to multiple loci, 65 SNP-probes and the probes containing SNPs at the target CpGs with a MAF >5% were excluded 5 . We implemented "DASEN" 6 to perform signal correction and normalization. After QC, 640 blood samples, 478 nasal samples, and 436,824 probes remained; after matching up with data from all the available layers, 348 samples were used for the analyses.

External replication
We first replicated our model in a cohort of comparable age but different ethnicity (Epigenetic Variation and Childhood Asthma in Puerto Ricans (EVA-PR).

EVA-PR
The Epigenetic Variation and Childhood Asthma in Puerto Ricans (EVA-PR) is a case-control study of asthma in subjects aged 9-20 years; cohort recruitment, procedures, and methods have been described previously 11,12 . Briefly, participants with and without asthma were removed using an empirical Bayes framework, and sva was used to estimate latent factors (LFs) that capture unknown data heterogeneity. To account for population stratification, we also adjusted our models for principal components derived from genotype data (using Illumina HumanOmni2.5 BeadChips).
We also tried to expand the model to other ages especially to two younger cohorts of children aged around 6 years: Copenhagen prospective studies on asthma in childhood (COPSAC) 13 and the Dutch MAKI trial 14 .

COPSAC
The COPSAC2010 cohort is an ongoing, prospective mother-child cohort comprising 731 children born to unselected mothers from Zealand, Denmark, in 2009-2010 as described previously 13 . At week 24 of pregnancy, women were randomly assigned to receive n-3 long chain polyunsaturated fatty acids (n = 362) and placebo (n = 369). Baseline information as well as details of this randomized controlled trial have been published 15 .
The study was conducted in accordance with the guiding principles of the Declaration of Asthma, Eczema and Rhinitis were diagnosed longitudinally by research physicians following the same standardized algorithm. An asthma diagnosis required a certain burden of troublesome lung symptoms, response to treatment with inhaled corticosteroids and relapse after withdrawal of treatment, which was reported previously 15 . In this replication, current asthma by age 6 years was defined as still having asthma symptoms and needing regular treatment with inhaled corticosteroids in the previous year.
Rhinitis is defined as the presence of sneezing, runny, itchy or stuffed nose and the absence of infection. Specifically, besides absence of infection, two or more of the following symptoms experienced in more than one hour for more than two days: Runny nose, sneezing, itchy or blocked nose. If these criteria are not met, but the symptoms persist for more than one season or are relieved by antihistamines we include them as cases. Rhinitis symptoms were evaluated at 6 years of age.
Eczema is defined as positive according to the definitions in Hanifin & Rajka. We required three major and three minor features to be regarded as a case 16 . Eczema symptoms were evaluated at 6 years of age. Nasal epithelial cells were collected by brushing around age 6. Briefly, the right nostril of the subjects was examined and the inferior turbinate was located using a speculum and penlight. The lateral area underneath the inferior turbinate was then brushed for 3 seconds with two brushes (Copan, 56380CS01 FLOQswabs) and these were placed in a 2 ml screwcap Eppendorf tube and put into a freezer at -80o C until further processing.

Sensitization (sIgE) is defined as specific
DNA was extracted from nasal brushes using the DNA investigator kit (Qiagen, Benelux BV, Venlo, the Netherlands). This was followed by precipitation-based purification and concentration using GlycoBlue (Ambion). 500 ng of DNA was bisulfite-converted using the EZ 96-DNA methylation kit (Zymo Research), following the manufacturer's standard protocol. After verification of the bisulfite conversion step using Sanger Sequencing, DNA concentration was normalized and the samples were randomized to avoid batch effects.
One standard DNA sample per chip was included in this step for quality control. Study personnel and technicians were blinded for the intervention.
In total, 296 nasal epithelium samples with sufficient DNA quality and quantity were probes remained for further analyses.

Nasal single cell RNA-seq cohort
Nasal brush samples from 4 asthma patients and 5 healthy controls were collected from inferior turbinate by Cytosoft brush CP-5B (Cyto-Pak). The brushes were collected in a 50mL tube containing HBSS(Lonza) + 1%Pen/Strep. Cells were spinned down at 560xg for 5 min.
Cell pellet was then resuspended in HBSS containing 1mg/ml Collagenase D and 0.1mg/ml DNase I (Roche) and placed at 37℃ for 1 hour with gentle agitation. Cell suspension was pushed through a 70uM nylon cell strainer (Falcon) and spinned down at 560xg for 5 min.
Next, cells were washed with PBS containing 1% BSA (Sigma Aldrich). Single cell suspension was cleared of red blood cells using a Red Blood cell lysis buffer (eBioscience).
Cell suspension was counted manually using a haemocytometer and concentration was adjusted to a minimum of 300 cells/ul. Cells were loaded according to the standard protocol of Chromium single cell 3'kit. Following steps were performed according to Single Cell 3'Reagent Kits V2 User guide. We performed RNA-sequencing on Illumina Hiseq 4000 or NOVA-Seq 6000 aiming to achieve a mean coverage of 50k -100k reads/cell. 10X Genomics raw sequencing data was processed using CellRanger software and the 10X human genome

Mediation analysis
To evaluate what proportion, if any, of an association between SNP and asthma was mediated through changes in the DNA methylation level of the three CpG sites, we conducted mediation testing using R package mediation 19